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Abstract 

Energy efficient information transmission may be relevant to biological sen- 
sory signal processing as well as to low power electronic devices. We explore 
its consequences in two different regimes. In an "immediate" regime, we argue 
that the information rate should be maximized subject to a power constraint, 
while in an "exploratory" regime, the transmission rate per power cost should 
be maximized. In the absence of noise, discrete inputs are optimally encoded 
into Boltzmann distributed output symbols. In the exploratory regime, the 
partition function of this distribution is numerically equal to 1. The structure 
of the optimal code is strongly affected by noise in the transmission channel. 
The Arimoto-Blahut algorithm, generalized for cost constraints, can be used 
to derive and interpret the distribution of symbols for optimal energy efficient 
coding in the presence of noise. We outline the possibilities and problems in 
extending our results to information coding and transmission in neurobiological 
systems. 



1 Introduction: The Utility of Information 

There is increasing evidence that far from being noisy and unreliable, spiking neurons 
can encode information about the outside world precisely in individual spike tim- 
ings ||de Ruyter et.al.,1997]| , [[Berry et.al., 1997[ , ||Buracas et.al., 1998|| . Estimates of 



the information transmitted by sensory neurons have often found them to be highly 
informative, sending 2 to 5 bits per spike, and quite reliable, using roughly half 
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of the total entropy available in their spike trains ([ Buracas et.al., 1998|| (monkey), 
Warland, 199T| (cricket), |[Rieke et.al., T99l and jRieke et.al., T995[l (frog), [Berry 



et.al., 1998] (retina), [[Strong et.al., 199 7|| (blowfly HI), [[Warland et.al., 1997|| (retina), 
[Reinagel et.al., 1998] (cat) - see |[Rieke et.al., 1997| for a discussion and review.) So 
it is possible that there are behavioral regimes where information theory will be a 
powerful tool for predicting the structure of neural codes, provided the costs and 
constraints of biological computation are properly incorporated. Therefore, as a step 
towards a biologically relevant information theory, we examine the effect of energetic 
costs on the coding and transmission of information by discrete symbols, following 
important prior work by Levy [[Levy et.al., 1996][ and Sarpeshkar [^arpeshkar, 1998 



We have in mind a model of a sensory system where signals from the natural world 
are detected and encoded, and pass through a noisy channel before arriving at a deci- 
sion making receiver. Our results are equally relevant to low power electronic devices, 
such as mobile telephones, that are constrained by finite battery life. 

In general terms, the role of a sensory system in the process of information use 
by an organism is summarized in Fig. 1. Information about the environment is 
detected by sensors and encoded for transmission through an information channel 
to a control system. For example, the retina detects patterns of light, which are 
encoded by ganglion cells for transmission through the optic nerve to the brain. We 
might expect evolution (or engineering) to produce systems which make an "optimal" 
choice for both the amount of information to transmit, and given the amount, for 
the kind of information to transmit. The amount of information is quantified in 
classical information theory by the mutual information I{S] Z), and by the rate R = 
I{S; Z)/N during a period in which N symbols are transmitted [[Cover et.al., 199T 



As we will describe, information theory can be used to determine the minimum power 
necessary to transmit at a given rate R, or the minimum energy needed to transmit 
a given amount / of information. The rate at which the organism should operate is 
determined by a tradeoff between the value and cost of the transmitted information. 
We outline two different behavioural regimes in which these tradeoffs leads to different 
coding strategies. 



Immediate regime: In some activities an organism is engaged in a time-critical 
task involving rapidly changing environmental states, and its performance depends 
strongly on its rate of sensory information acquisition R. For example, a cheetah's 
effectiveness in catching a gazelle, and hence in procuring metabolic gains from food, 
might be expected to improve with increasing R. However, acquiring sensory infor- 
mation also incurs a metabolic cost at some rate E, and unless the resulting rate 
of metabolic gain for the organism V is great enough, the expenditure may not be 
worthwhile. While numerous factors affect the value of information, we will focus 
on how it varies with the rate R and consider a value function V{R) with all other 
variables held constant. 

In general, we expect that the value V{R) of sensory information will increase 
monotonically, but not linearly, with R. At a low enough rate of acquisition, the 
sensory modality will be of no use to the organism. For instance, if the cheetah sees 



2 



half as well, it will not capture half as many gazelles - it will starve. Conversely, at a 
high enough information rate the value should saturate, as there is only so much meat 
in a gazelle. Balancing the marginal increase in value to the organism, dV{R)/dR, 
against the marginal increase in energy expense, dE{R)/dR, yields some optimal 
rate R* for such an immediate regime. Alternatively, there may be some structural 
constraints, such as signal-to-noise ratio of the sensory modahty or processing speed 
of the biological circuitry, that limit the attainable rate R"^. 

We cannot compute R* without knowing the value function V{R), and we cannot 
compute R'^ without knowing the structural constraints. However, the smaller of 
these two values will set the organism's rate of sensory information acquisition, and 
whatever it is, an optimal code will minimize the energy cost for this rate. To study 
the structure of such codes we can simply ask how to minimize the power required to 
transmit at a given information rate. As we shall see, E{R) is an increasing convex 
function, so this is equivalent to determining that maximum rate R{E) of information 
transmission given a constraint of average energy E per symbol. (See Fig. 2.) 

Exploratory regime: In many other situations, the relevant environmental state 
is changing slowly and an organism is not faced with any urgent tasks. Here, it is free 
to choose the rate at which it surveys its surroundings, as well as the time it spends 
before taking a behavioral decision that changes its environment. The quality of ex- 
ploration will depend on the total amount of sensory information acquired. Better 
exploration will allow the organism to achieve more appropriate behavior, but con- 
tinued exploration will involve a cost in metabolic energy as well as in opportunities 
for other behavior. Therefore, there will be an optimal amount of information, /*, 
that the organism should acquire, where the marginal value of exploration matches 
its marginal cost. 

We cannot compute /* without knowing the value of exploration achievable using 
an amount / of information. However, whatever the value of I* , and independently of 
the details of the activity, an "optimal" sensory system will transmit that information 
at the rate which minimizes the cumulative energy cost Ec{I*). The convexity of E{R) 
implies that this is achieved by a sensory system that transmits at a fixed rate, as any 
variations in the rate will result in a higher cumulative energy cost. This optimal rate 
of sensory information acquisition will minimize Ec{I*) — E{R)^, or equivalently, 
maximize R{E)/E. 

Low power devices: Both the immediate and the exploratory regimes apply to 
low power electronic devices, such as mobile telephones or laptop computers. The 
finite battery lifetime of these devices puts a premium on energy efficiency. The 
immediate regime is equivalent to an "on-line" mode, where the information rate 
of the device is determined by the application but the total amount of information 
is variable. The exploratory regime is equivalent to an "off-line" or "batch" mode, 
where the total amount of information to transmit is set, but the rate is variable. 
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Summary: A system operating at any given information rate R should transmit 
using the minimimum energy E{R) required for that rate, all other constraints being 
held equal. In immediate activities the optimal rate is determined by the tradeoff 
between gain realizable at rate R and the cost E{R). However, in an exploratory 
regime, the optimal rate maximizes R{E)/E, independently of the details of the 
activity. The next sections describe the general structure of energy efficient codes. 



2 Metabolically constrained capacity and coding 

In this section we consider the consequences of metabolic efficiency in information 
transmission. We will not address the problem of determining what information to 
transmit, but abstract the mapping S ^ X in Fig. 1 as performing this task. From 
this point of view, we can treat X as a sequence of symbols to be encoded into a 
sequence Y of channel inputs, which get transmitted to produce an output sequence 
Z. Denote the elements of these sequences at a specific time as x, y, and z. Channel 
transmission is both noisy and energetically costly. 

Assume a discrete memoryless channel, modeled by cross-over probabilities Qk\j = 
Pt{z = Zk\y = Vj} giving the probability that a channel input symbol yj results in a 
channel output Zk- The organism as a whole incurs a variety of energetic expenditures 
at all times, but we will focus on the costs of operating the sensory system, these 
being relevant to the optimization considered here. The energetic cost of transmitting 
information can be referred to either the input Y, the output Z, or may even be a 
function of both X and Y. However, we choose to associate energy costs {Ei, ■ ■ ■ , En} 
with input symbols {yi, ■■■,?/„}. This entails no loss of generality, since for arbitrary 
costs Ejk depending on both input yj and output Zk we may simply take Ej as the 
expected cost Ej = J2kQk\jEjk for use of symbol yj. 

Our goal is to find, for any given energy E, the maximum achievable mutual 
information /(X; Z) between the signal X and the channel output Z, with expected 
energy cost E < E. However, it can be shown that I{X; Z) < I{Y] Z), with equality 
when X can be completely determined from Y ||Cover et.al., 199"T| . Intuitively, the 
encoding from X to y should exploit the channel characteristics, but without loss 
of information about X. Assuming the mapping from X to F is indeed lossless, 
maximizing /(X; Z) reduces to maximizing I{Y; Z). Correlations within the sequence 
Y will always decrease the total amount of transmitted information, since this is 
bounded above by the entropy of Y. So to maximize I{Y; Z) we can assume that 
the symbols of Y are independently drawn from a distribution q{y) over the channel 
inputs. But both I{Y; Z) and E depend upon qiy)] so, formally, the problem is to 
determine the function 

C{E) = {l/N) max I{Y-Z) ; E = Y,q{y,)E,, (1) 

q{y) ; e<e ■ 

where C{E) is called the channel capacity- cost function [ [Blahut, 1987|| . It is evident 
from (|l|) and the statistical independence of symbols in Y that C{E) = R{E) where 
R{E) is the constrained transmission rate discussed earlier. The channel coding the- 
orems of classical information theory assert that reliable transmission of information 



4 



is possible at any rate less than R, and at no rate greater than R. Our focus is not 
on reliable transmission per se, but simply on the maximum per symbol rate R{E) 
at which mutual information I{Y; Z) can be established given the constraint E < E. 

We now address, first in the noiseless case, then for a noisy channel, the related 
problems of: (1) characterizing C{E), (2) determining the distribution qE{y) which 
achieves C{E), and (3) finding the maximum of C{E)/E. The first two problems 
are of interest because an energy-optimal device or organism should achieve C{E) for 
whatever energy E it is operating at, requiring a very particular distribution over y. 
The third problem is interesting because it allows us to determine both the rate C* 
and energy E* at which an energy-optimal organism would operate in the exploratory 
regime, regardless of the details of its activity. 



2.1 Efficient Noiseless Transmission 

In the absence of noise, the channel input and output are equal [Y = Z), and the 
mutual information I{Y] Z) equals the channel input entropy H{Y). So, finding the 
capacity at fixed energy reduces to maximizing the entropy of Y at fixed energy. Cor- 
relations within the sequence Y will always decrease the entropy, so we can assume 
that the symbols in Y are drawn independently from some distribution q. The pur- 
pose of the encoding process X — F is to implement a deterministic map between 
the signal X and the channel input Y , in such a way that the symbols of Y are statis- 
tically independent and have a distribution q. We will not dicuss how this encoding 
is performed in practice and will focus instead on the structure of the optimal distri- 
bution Then the per-symbol information rate (or entropy) and energy involved 
in the transmission are H = — J2]=i qj In^j and E = J2]=i QjEj, where qj = q{yj). In 
the immediate regime we maximize H at fixed E, while in the exploratory regime we 
maximize H/E. 



Immediate regime: Entropy maximization at fixed average cost is a classic prob- 
lem, solvable using the method of Lagrange multipliers by defining the function 

n In \ / " \ 

G = -Eg. Ing, +/? E?.^. -E\+\\Y^q,-l\ (2) 
i=i Vi=i / \i=i / 

and setting its derivatives with respect to (3, A, and all the qj equal to zero. Set- 
ting ^ = ensures that the q remains a probability distribution. The conditions 
dG/dqj = dG/d/3 = dG/dX = can be solved simultaneously to yield 

. z-Te-"-^. ■ E g-'^'-'"' g'"^ (3) 



^ There are standard algorithms in coding theory that perform such mappings between X and 



Y [ Cover et.al., 1991 . Most such algorithms are not biologically plausible and it would be very 
interesting to determine whether suitable encoding algorithms can be implemented by biological 
hardware. 
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where the normahzation factor Z is known as the partition function and (5 is imphcitly 
determined by demanding that the average energy be E. We are simply recovering the 
commonplace fact of statistical physics that entropy is maximized at fixed average 
energy by a Boltzmann distribution with an "inverse temperature" /3 defined by 
(0). Standard results about Boltzmann distributions then tell us that the maximum 
information rate at fixed energy H{E) is a convex function of E, increasing from at 
i^min = mmj{Ej) to a maximum -ffmax = Inn at -Emax = Y^]=i Ej/n. (In the language 
of statistical physics, the "heat capacity" is positive.) Larger energies {E > -Emax) 
lower the entropy. (See Fig. 2.) 

Exploratory regime: In the exploratory regime, we maximize the information 
transmitted per energy cost. So we should extremize 

with respect to A and all the qj. If G is maximized by some distribution q, there is a 
corresponding information rate H and power consumed E. We have already shown 
that for fixed E the information rate is maximized by the Boltzmann distribution (|^). 
So q must be Boltzmann for some inverse temperature (3. This reduces the multi- 
variable optimization problem of maximizing G to a single equation - choose q to be 
Boltzmann as in (^ and demand that dG/dj3 = 0. It is easy to solve this condition 
in terms of the partition function (^ and H = (3E + In Z. Maximizing with respect 
to j3 gives the condition \n.Z ^-^^ = 0. Solutions which maximize G satisfy 

lnZ = ^ Z = 1. (5) 

Thus, information transmission is optimized in the exploratory regime by a Boltzmann 
distribution with unit partition function. This selects a particular energy E* and 
associated entropy H* . Despite the ubiquity of the partition function in statistical 
physics, this is the only instance, insofar as the authors are aware, of a clear physical 
meaning assigned to a particular numerical value of Z. 



2.2 Efficient Noisy Transmission 



Now consider the noisy channel. Once again, the capacity will be maximized when 
the symbols of the sequence Y are chosen independently from some q{y) because 
correlations reduce transmitted information. With this assumption, and the channel 
crossover probabilities defined in Sec. |, the channel capacity (|lD at a fixed transmis- 
sion energy becomes 



C{E) 



max 

q{y) ; E<E 



- E ^3 + E (ljQk\j log Pj 



jk 



where Pj\k = Pt{(Y = yj\Z = z^} is given by 

P{y = yj,z = Zk) 



p{z = Zk) 



(ljQk\j 



(6) 



(7) 
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The maximization is complicated by the dependency of Pj\k on qj. An insight due 
to Arimoto [[Arimoto, 1972|| and Blahut piahut, 1972|| , which still applies despite the 
energy constraint, is that (0) can also be written as the double maximization 



C{E) = max 

g{y),P ; e<e 



Y^q,\nq,-H{Y\Z) 



where we define H{Y\Z) = J^jQjHj = —J2jkQjQk\j^ogPj\k. The advantage of this 
form is that the capacity can be computed numerically by an iterative algorithm which 
alternately maximizes with respect to qj and Pk\j while holding the other variable 
fixed. Each of these maximizations can be carried out using Lagrange multipliers, as 
in the previous derivations. The resulting algorithm can be summarized as: 



1. Choose arbitrary nonzero q. 

2. For t = 0,1, 2, ... repeat: 



(0) 



(a) Pi 



(b) g 



(i+l) 



with /3 chosen so J2iQf^^^ Ej = E 



-l3Ej-H 



(t) 



(c) If ({j^^^ close to qf* stop 

The correctness of this generalization of the classic Arimoto-Blahut algorithm is dis 
IBlahut 



cussed in 



In maximizing with respect to q in step (2b), H{Y\Z) and 
the energy costs play identical roles. Indeed, H(Y\Z) is essentially the average cost 
due to information loss in noise, leading to the Boltzmann distribution in step (2b). 
This algorithm yields the capacity at fixed energy C{E), and the associated distri- 
bution qE{y)- In the exploratory regime, numerical optimization of C{E)/E gives an 
optimal energy E*, associated capacity C* and distribution qE'iy)- 



Summary: Given the channel noise and the symbol energies, the capacity func- 
tion C{E) can be computed. In the noiseless case, it is achieved by a Boltzmann 
distribution. For a noisy channel, C (E) is computed numerically, and in all cases the 
distributions produced by the algorithm above achieve metabolically optimal trans- 
mission. In the exploratory regime, the rate should be chosen to maximize C{E)/E 
which is achieved in the noiseless case when Z equals 1. We have not discussed the 
implementation of the encoding from X into Y, which may be realized by either 
arithmetic or block coding methods [[Cover et.al., 199"l||. How well this mapping can 



be approximated by biological organisms is a question for investigation. 



3 Characteristics of the efficient code 

In this section, we consider some of the properties of energy efficient codes. First, we 
show that the optimal code is invariant under certain changes in the symbol energies. 
Then we illustrate some of the effects of adding noise. 
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3.1 Energy invar iances 



The metabolically efficient distribution on code symbols is invariant under some trans- 
formations of the energy model in both the immediate and exploratory regimes. Re- 
gardless of whether the energy costs are assigned to the channel inputs yi or the 
channel outputs Zj, the optimal immediate symbol distributions are independent of a 
constant shift in the energies {E^ Ek + A). In the exploratory regime, the optimal 
distribution is independent of rescalings of the energies [E^ ^Ek). This is shown 
as follows. 



Immediate regime: In the immediate regime we fix the average transmission 
energy {E), and carry out the Arimoto-Blahut optimization algorithm in Sec. p.2| . 
First suppose that symbol energies Ej have been assigned to the channel inputs. We 
choose an arbitrary starting distribution qf^ for the channel inputs and iteratively 
perform steps (a) and (b) of the algorithm to find improved distributions qf'^^^ ■ Step 

(a) leaves qf'' unchanged. Step (b), which computes qf~^^\ is manifestly invariant 
under a constant shift of the input energies Ej —>■ Ej + A, accompanied by a shift 
of the average transmission energy E E + A. So the energy-optimal immediate 
distribution is invariant under a simultaneous constant shift of all the symbol energies 
and the average energy. Next suppose that symbol costs Uk have been assigned 
to the channel outputs Zk- The average energy expended by a channel input i/j is 
Ej = J2kUkQk\j- Since this relation is linear, a constant shift by A of the output 
energies Uk translates to a constant shift by A of the input energies Ej, leaving the 
optimal immediate distribution invariant. 



Exploratory regime: Suppose the channel inputs have energy Ej and that C{E) 
is the channel capacity at fixed transmission energy E. We compute the exploratory 
regime optimum by setting 

d{C{E)/E) ^ 1 dC{E) _C{El^ 

dE E dE E^ ' ^ ^ 

It follows from the Arimoto-Blahut algorithm that the optimal input distribution at 
fixed transmission energy is invariant under a combined rescaling of both the input 
symbol energies and the average transmission energy {Ej. XEj, and E ^ XE). To 
see this, observe that step (a) of the algorithm does not change the distribution while 
the condition in step (b) is solved for the new energies by rescaling j3 —>■ (3/\. Since 
the capacity is a function of only the distribution of code symbols and not directly 
of the symbol energies, we conclude that the capacity for the system with rescaled 
energies, C\, satisfies the relation 

Cx{\E) = C{E) . (10) 

To find the optimal exploratory distribution with the rescaled energies we must solve 
d{Cx{E)/E)/dE = 0. Changing variables to E = E/X and using (p^ we find that 

d{C,{E)/E) _ 1 d{C,{XE)/E) _ 1 d{C{E)/E) _ 

dE A2 dE X^ dE ■ ^ ^ 
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Since this equation is proportional to the optimal exploratory distribution is 
invariant under a rescaling of the input energies. If we assign costs f/fc to the output 
symbols, linearity of the relation Ej = J2kUkQk\j between input and output costs 
implies that rescaling the output energies rescales the effective input energies and 
again leaves the exploratory optimum invariant. 

3.2 The effects of noise 

In general, an energy-efficient code should suppress the use of expensive symbols. 
However, noise can have a dramatic effect, since conveying information requires the 
use of reliable symbols. In fact, the noisiness of a cheaper symbol can easily lead to 
its suppression relative a more expensive, but reliable, symbol. This sort of effect is 
particularly important in applications to biological systems, and is illustrated in the 
toy examples below. 

Consider a noisy channel in which six symbols {yi,---y6} are transmitted as 
symbols {zi, ■ ■ ■ zq] with channel crossover (noise) probabilities Qk\j = Pr{z = Zk\y = 
yj} as in Sec. ^. Furthermore, let the output symbol Zn have a transmission energy of 
Un = n. Then the average energy of the channel input symbol is Ei = J2n=i UnQn\i- 
In the absence of any noise at all, Qk\j = Skj and so Ei = Ui and the channel input 
and channel output distributions for the exploratory regime are both given by: 

p-/3n 6 

Pr(2/„) = Pr(zO = ^ ; Z=J2e-^- = l (12) 

^ n=l 

In other words, the channel input and output distributions are both exponential and 
the weight in the exponential is determined by the condition Z = 1. In this case we 
find /3 = 0.685. 

Next suppose that we have "nearest neighbour noise" : 
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(13) 



Here Qk\j is the entry in the j* row and k* columns of the matrix Q. Fig. 3 shows the 
optimal exploratory regime distribution on channel output symbols, for several values 
of noise parameter p. Notice the marked deviation of the optimal output distribution 
from a pure exponential as the noise increases. For p = 0.25, the least energetic 
symbol yi, with ii^i = 1, is suppressed so strongly that it is less likely than symbol 
y2, with E2 = 2. Among the various intricate effects we have observed in the optimal 
distribution as a function of noise is a "phase-transition-like" behaviour where the 
probability of a symbol evolves smoothly until the noise reaches some critical value, 
and then drops suddenly to essentially zero. Fig. 4 shows such effects for the input 
distribution to the channel ([T3|). 
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In statistical physics, phase transitions occur due to tradeoffs between energy and 
entropy. Physical systems at finite temperature try to minimize their energy but 
maximize their entropy, leading to sharp transitions, such as the melting of ice, at 
a critical temperature. In our case, information lost to noise decreases the mutual 
information between the channel input and output, and this reduction in mutual 
information competes against energy minimization in the optimization. The sharp 
transitions as a function of noise (Fig. 4) are a result of this tradeoff. Since biological 
signal processing systems are noisy, it is important for applications of our formalism 
that the noise be carefully measured and included in the model. 



4 Application to neural systems 

Our primary motivation in analyzing energy efficient information transmission is to 
provide a formalism which can make quantitative predictions about the detailed struc- 
ture of neural codes. To this end, we must identify circumstances in which the neural 
code can be thought of as a sequence of discrete symbols with distinct energies. Given 
such a set of symbols as well as a characterization of their transmission noise and en- 
ergy cost, we can predict the unique symbol distribution that maximizes information 
transmitted per unit metabolic energy and compare this against the measured symbol 
distribution. 

The vertebrate retina provides a particularly good example. Its input is a visual 
image projected by the optics of the eye; its output consists of easily-measured action 
potentials. The optic nerve, which connects the eye with the brain, represents the 
visual world with many fewer neurons than at any other point in the visual pathway, 
suggesting that principles of efficient coding may be relevant. In addition, patterns of 
light with particular behavioral importance, for instance the image of a tiger, are dis- 
tributed over many photoreceptor cells, the primary light sensors of the retina. This 
makes it difficult for any single retinal neuron to evaluate the behavioral significance 
of an overall image. Therefore, we expect that the value of the signal transmitted by 
a given optic nerve fibre is closely related to its information content in bits. 

Previous studies fBerry et.al., 1997|, [Berry et.al., 2000[] have shown that ganglion 



cells, the output neurons of the retina whose axons form the optic nerve, often trans- 
mit visual information to the brain using a discrete set of coding symbols. In these 
experiments, the retina was stimulated with a wide variety of temporal and spatial 
patterns of light drawn from a white noise ensemble [ [Berry et.al., 1997[ ]. Under these 



stimulus conditions, ganglion cells responded with discrete bursts of several spikes 
separated by long intervals of silence. The reproducibility of these firing events was 
very high: the timing of the first spike jittered by ~ 3 ms from one stimulus trial 
to the next and the total number of spikes varied by ~ 0.5 spike. This precision 
implies that each event is highly informative and that events with different numbers 
of spikes can reliably represent different stimulus patterns. In addition, correlations 
between successive firing events were very weak, implying that each firing event is an 
independent coding symbol that carries a discrete visual message. 

This suggests that the size of each firing event (i.e., the number of spikes it con- 
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tains) may be treated as a discrete symbol in the retinal code. A short duration of 
silence may likewise be discretized to a symbol 0. The experimentally measured se- 
quence of retinal ganglion cell events, discretized in this manner, is represented in our 
model as the output sequence Z. In addition, S is the visual stimulus to the retina, 
X is output of the photoreceptors, and Y is an internal retinal variable representing 
the ideal retinal output prior to the addition of noise. Repeated presentations of the 
same stimulus produces a distribution of ganglion cell events with a sharp peak at a 
certain symbol, and a width that we attribute to noise. Interpreting the peak of the 
distribution as the intended noiseless output y , the distribution of actual ganglion cell 
outputs yields the channel noise matrix required by our model. Given a measurement 
or an estimate for the energy consumption by events of different sizes (see below), 
our framework then predicts a specific optimal distribution of event sizes. Compar- 
ison of this distribution against the experimentally measured event distribution is a 
quantitative check of the relevance of metabolically efficient coding to the retina. 

More generally, our methods may be applied in any system where a suitable dis- 
cretization of the neural code is available, along with a description of noise and costs. 
The all-or-nothing character of action potentials makes such discretization possible: 
by choosing an appropriate time bin, a spiking neuron's activity becomes a sequence 
of integer spike counts. The choice of time bin and independent "codewords" will 
depend on the neuron being studied. The noise can be measured experimentally by 
repetition of an identical stimulus and observation of the resulting distribution of 
output symbols. 

The symbol energy is more difficult to access experimentally. However, Siesjo 
[Siesjo, 1978] and Laughlin et.al. IP^aughlin et.al., 1998|| have argued that the domi- 
nant energy cost for a neuron arises in the pumps that actively transport ions across 
the cell membrane. If this is true, then the symbol energy can be found by simulating 
the known ionic currents in a neuron to find the total charge transported during dif- 
ferent time periods, as this charge flow must be reversed by active transport in order 
to maintain equilibrium. Because ionic currents are large during an action potential, 
the symbol energy is likely to be given by a baseline metabolic cost plus an additional 
increment per spike, = 1 + bN, where b is the ratio of spiking cost to baseline 
cost during the time bin. The baseline cost has components due to leak currents, 
synaptic currents and other cellular metabolism. Estimates of b vary, and depend 
on the neuron in question. While a variety of measurements indicate that electrical 



activity accounts for roughly half of the brain's total metabolism ||Siesjo, 1978|| , the 



parameter b may still be small. In any case, since cellular metabolism is difficult to 
estimate, and because it is unclear in the present context whether pre-synaptic and 
post-synaptic costs should be bundled into the expense of producing a spike, b can 
be treated as a free parameter for each neuron, and varied to find the energy-efficient 
code that best agrees with the neuron's distribution of coding symbols. 

Direct determination of metabolic activity is possible for an entire tissue by mea- 
surements of oxygen consumption or heat production. Furthermore, the metabolic ac- 
tivity of a single neuron could be obtained by measuring the uptake of a radioactively- 
labeled metabolic precursor, such as glucose, during stimulation of the neuron at 
different firing rates. Such measurements could fix or place bounds on the possible 
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values of b. 

Summary: We have outlined how the formalism developed in this paper can be 
applied to real neurons, with particular emphasis on retinal ganglion cells. Discrete 
output symbols may be defined by counting the number of spikes produced within a 
fixed time window. The noise in each symbol can be experimentally measured, and 
the energy cost can be estimated. Finally, the optimal distribution of spike counts in a 
symbol can be computed using our methods and compared to the actual distribution 
used by the neuron. Such a test would determine whether the metabolic cost of 
information transmission is an important constraint in the structure of a neural code. 

5 Discussion 

We have described energy efficient codes in two different regimes: an immediate 
regime, where a system's rate of information transmission is set by external con- 
straints, and an exploratory regime, where the total amount of information trans- 
mission is set by external constraints. The optimal codes in these cases are closely 
related, both following a Boltzmann distribution in the symbol energies, pj ~ e'^^^ , 
when there is no noise. In the immediate regime, the inverse temperature, /3, is set 
to yield the imposed information rate, while in the exploratory regime, f3 is set to 
make the partition function, Z, equal to one. With the addition of noise, the optimal 
code must be obtained numerically, but can always be found using a straight-forward 
iterative scheme. 

In delineating the immediate and exploratory regimes, we do not expect that 
all of an organism's behavior can be neatly assigned to one or the other category. 
Instead, we propose here that they apply to some behaviors. We have argued for 
an immediate regime in which the transmission rate is set by the need to respond 
rapidly to environmental pressures. However, there will certainly also be situations 
where the rate is determined instead by complex interactions involving the internal 
needs and constraints of the organism. 

There are also subtleties in identifying regimes of behaviour that are "exploratory" . 
We have described an idealized situtation where an organism acquires a certain 
amount of sensory information before executing a single behavior. More realistically, 
the organism simultaneously acquires sensory information relevant to many possible 
behaviors, and the interplay between sensation and behavior is ongoing. This can be 
analyzed within our framework by determining the different amounts of optimal infor- 
mation /* associated with each behaviour, and then requiring that the total amount 
of data be gathered simultaneously. The exploratory regime optimization continues 
to determine the total rate at which the information should be gathered. The es- 
sential point is that in this regime the organism's behavior is open-ended: it has 
sufficient time to choose a rate of sensory information acquisition that achieves en- 
ergy efficiency, while still being able to acquire enough information to make a "good" 
behavioral decision among the available choices. 
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We have described how our formahsm can be apphed to a biological system, 
like the retina. Our methods should also be useful in the analysis of low power 
engineered systems, such as mobile telephones or laptop computers which use discrete, 
independent coding symbols. In this case, the engineer controls the particular choice 
of coding symbols, as well as the design of the encoding algorithm and the transmission 
channel. The energy and noise characteristics of the channel can therefore be precisely 
determined as inputs to our theoretical analysis. Perhaps such an exercise will help 
in designing low power devices that can perform for longer times before running down 
their batteries. 

Acknowledgements: M.B. was supported by a National Research Service Award 
from the National Eye Institute. V.B. was initially supported by the Harvard Society 
of Fellows and the Milton Fund of Harvard University. V.B. and D.K. are grateful to 
the Xerox Palo Alto Research Center, the Institute for Theoretical Physics at Santa 
Barbara and the Aspen Center for Physics for hospitaUty at various stages of this 
work. 

After this work was completed, we became aware that some of the results presented 
here have been obtained independently by Gonzalo Garcia de Polavieja. 



References 



[Arimoto, 1972] 

[Berry et.al., 1997] 

[Berry et.al., 1998] 
[Berry et.al., 2000] 
[Blahut, 1972] 

[Blahut, 1987] 
[Buracas et.al., 1998] 



S. Arimoto. An algorithm for computing the capacity of an 
arbitrary discrete memory less channel. IEEE Trans, on Info. 
Theory, IT-18:14-20, 1972. 

M.J. Berry II, D.W. Warland, and M. Meister. The structure 
and precision of retinal spike trains. Proc. Natl. Acad. Sci. 
USA, 94:5411-5416, 1997. 

M.J. Berry II and M. Meister. Refractoriness and neural 
precision. Journal of Neuroscience, 18:2200-2211, 1998 

M. J. Berry II, D. W. Warland, and M. Meister. Firing events: 
fundamental symbols in the retinal code. In preparation. 

R.E. Blahut. Computation of channel capacity and rate dis- 
tortion functions. IEEE Trans, on Info. Theory, IT-18:460- 
473, 1972. 

R.E. Blahut. Principles and Practice of Information Theory. 
Addison- Wesley, Massachusetts, 1987. 

G.T. Buracas, A.M. Zador, M.R. deWeese, and T.D. Al- 
bright. Efficient discrimination of temporal patterns by 
motion-sensitive neurons in the primate visual cortex. Neu- 
ron, 20(5):959-969, 1998. 



13 



[Cover et.al., 1991] 
[de Ruyter et.al., 1997] 

[Laughlin et.al, 1998] 

[Levy et.al, 1996] 
[Reinagel et.al, 1998] 

[Rieke et.al, 1993] 

[Rieke et.al, 1995] 



[Rieke et.al, 1997] 

[Sarpeshkar, 1998] 

[Siesjo, 1978] 
[Strong et.al, 1997] 

[Warland, 1991] 

[Warland et.al, 1997] 



T.M. Cover and J. A. Thomas. Elements of Information The- 
ory. Wiley, New York, 1991. 

R.R. de Ruyter van Steveninck, CD. Lewen, S.R Strong, 
R. Koberle, and W. Bialek. Reproducibility and variability 
in neural spike trains. Science, 275:1805-1808, 1997 

S.B. Laughlin, R. de Ruj^er van Steveninck, and J.C. An- 
derson. The metabolic cost of neural information. Nature 
Neuroscience, 1(1):36-41, 1998. 

W.B. Levy and R.A. Baxter. Energy-efficient neural codes. 
Neural Computation, 8:531-543, 1996. 

P. Reinagel and R.C Rcid. Visual stimulus statistics and the 
reliability of spike timing in the LGN. Soc. Neurosci. Abstr. 
24:139, 1998. 

F. Rieke, D. Warland and W. Bialek. Coding efficiency and 
information rates in sensory neurons. Europhys. Lett. 22:151- 
156, 1993. 

F. Rieke, D. Bodnar and W. Bialek. Naturalistic stimuli 
increase the rate and efficiency of information transmission 
by primary auditory neurons. Proc. Royal Society of London 
Series B, 262:259-265, 1995. 

F. Rieke, D. Warland, R.R. de Ruyter van Steveninck. Spikes: 
Exploring the neural code. MIT Press, Cambridge, Mass., 
U.S.A., 1997. 

R. Sarpeshkar. Analog versus digital: extrapolating from 
electronics to neurobiology. Neural Computation, 10:1601- 
1638, 1998. 

B.K. Siesjo. Brain energy metabohsm. John Wiley and Sons, 
New York, U.S.A., 1978. 

S.P. Strong, R. Koberle, R.R. de Ruyter can Steveninck, 
W. Bialek. Entropy and information in neural spike trains. 
Phys. Rev. Lett. 80:197-200, (1997). 

D. Warland. Reading between the spikes: real-time process- 
ing in neural systems. Dissertation, University of California 
at Berkeley, 1991. 

D.K. Warland, P. Reinagel and M. Meister. Decoding visual 
information from a population of retinal ganglion cells. J. 
Neurophysiol. 78(5):2336-2350, 1997. 



14 



Captions 



Fig. 1: Schematic view of an information system. 

Fig. 2: Schematic of energy optimization. The information rate (thick hne) is a 
convex function of the energy rate until -Emax- The exploratory regime optimum {R*, 
E*) is given by the intersection of the tangent from the origin (thin line) with R{E). 

Fig. 3: The effects of noise. Probability distribution of channel output symbols as 
a function of increasing nearest neighbour noise. The values of p and the associated 
optimal (3 displayed above are {p = 0, /? = 0.685}, {p = 0.1, /? = 0.420}, {p = 0.2, l3 = 
0.340}, and {p = 0.25, /3 = 0.317}. 

Fig. 4: Sharp transitions in symbol probabilities due to noise. Shown here is the 
probability of channel input symbols as a function of noise. Top row, left to right: 
Hi, 1/2, Us', bottom row, left to right: y^, y^, ye- Notice the different vertical scales in 
each panel. 
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